Newest 'python web-scraping' Questions

-4votes

0answers

34views

Python Selenium FireFox (Geckodriver) - Script runs on Windows but fails on Linux server (TimeoutError) EXE or python script both are not working

I have written a Python script for web scraping using Selenium with Firefox (Geckodriver). The script runs perfectly on Windows, but when I run it on Linux — either as a Python script or packaged as ...

Tharun 003

1

asked yesterday

-2votes

1answer

71views

Issues with Automated Twitter Account Creation Bot in Python (Playwright) - Unable to Find "Authenticate" Button

I'm developing a bot in Python to automate the account creation process on Twitter (X) using Playwright, but I am consistently facing issues in certain steps, especially when trying to find and click ...

Paulo victor

1

asked 2 days ago

1vote

2answers

110views

Can't close cookie pop up on website with selenium webdriver

I am trying to use selenium to click the Accept all or Reject all button on a cookie pop up for the the website autotrader.co.uk, but I cannot get it to make the pop up disappear for some reason. This ...

teeeeee

771

asked 2 days ago

0votes

0answers

35views

How to scrape tweet/thread and its replies based on conversation_id [closed]

I’m currently working on a project that involves scraping a single tweet and all its replies using tweet-harvest with an auth_token. Everything works fine, but I recently ran into an issue where I can ...

Irsyad Muhamad Firdaus

1

asked Apr 24 at 2:06

2votes

0answers

60views

python-requests-html render inconsistent result

background: by default the website is only showing few names and there s a "moreBtn" to generate the full list code idea: create Html session, render with script clicking the "moreBtn&...

Beginner

31

asked Apr 24 at 1:06

-1votes

2answers

37views

Why am I getting no data using BeautifulSoup and requests when scraping a news website?

import requests from bs4 import BeautifulSoup url = "https://example-news-site.com" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)" } response =...

sahzia buno

1

asked Apr 23 at 13:45

-1votes

2answers

76views

I wanted to get the number of playoff games of a list of 200 players from Basketball Reference. The code I wrote is giving me 0 value for all players [closed]

I want to get the number of playoff games played by a list of players. To do that I used Selenium and Beautiful Soup. The result is being saved in a csv file but the values for each of the player is ...

Priyankan Datta

11

asked Apr 22 at 10:52

1vote

2answers

51views

Importing geographic data with WFS works on Chrome but not on Python

I am trying to pull a geojson file from here. The JSON appears as expected when I paste that link into Chrome or Safari. However, I get the following error every time when I run the following code on ...

opposity

121

asked Apr 21 at 8:21

-3votes

0answers

27views

I tried to parse the page but always get duplicate texts [closed]

I can get the multiple pages parsed text. However, it will have duplicate paragraphs texts. For example, the content of the first page, it will be parsed 3 times in total. I use the python code and I ...

Kevin Hong

5

asked Apr 20 at 8:35

-1votes

0answers

72views

How to scrape the full New York Times article content using Selenium and BeautifulSoup without triggering the "Please enable JavaScript" message?

I'm building a scraper that fetches full article content from the New York Times using both the Article Search API and a hybrid static + Selenium-based HTML scraper. My goal is to extract complete ...

Abhishek Joshi

15

asked Apr 18 at 6:52

1vote

2answers

67views

How to detect and scrape a specific language version of a multilingual publication, if available?

I wrote a python script for scraping data from WHO website, I wanted to retrieve Title, author name, date, pdf link and child page link from parent page (i applied some filters on parent page) I am ...

Mann Jain

11

asked Apr 17 at 4:42

-1votes

0answers

34views

Scrapy: "RuntimeError: Engine Not Running" when I try to run my spider after installing Scrapy-Playwright

Background: I just installed scrapy-playwright on my virtual environment in order to scrape a website that renders some links I need with Javascript. The installation went well, but when I ran my ...

Ryan_Brusseau

5

asked Apr 16 at 17:12

0votes

0answers

55views

Crawl4AI token threshold not applied to raw html in arun

Here’s a brief overview of what I want to achieve Extract raw htmls and save them Use Crawl4AI to produce a ‘cleaner’ and smaller HTML that has a lot of information, including what I will eventually ...

Leksa99

117

asked Apr 13 at 13:10

-3votes

1answer

49views

How to switch to a popup cookie consent page?

I'm using Python 3.12.3, Selenium 4.31.0, Firefox driver in Ubuntu 24.04. When I try to open an url, a cookie consent popup, asking to continue without accepting, accept and more options. How can I ...

Michael

117

asked Apr 12 at 12:04

0votes

0answers

56views

Extract span values using BS4

I'm trying to extract "Date Applied" and "17 Apr 2025 06:00", from html below: <span class="labels" part="text-and-icon-labels"> <slot part="...

rauerfc

1

asked Apr 10 at 19:35

Collectives™ on Stack Overflow

All Questions

Python Selenium FireFox (Geckodriver) - Script runs on Windows but fails on Linux server (TimeoutError) EXE or python script both are not working

Issues with Automated Twitter Account Creation Bot in Python (Playwright) - Unable to Find "Authenticate" Button

Can't close cookie pop up on website with selenium webdriver

How to scrape tweet/thread and its replies based on conversation_id [closed]

python-requests-html render inconsistent result

Why am I getting no data using BeautifulSoup and requests when scraping a news website?

I wanted to get the number of playoff games of a list of 200 players from Basketball Reference. The code I wrote is giving me 0 value for all players [closed]

Importing geographic data with WFS works on Chrome but not on Python

I tried to parse the page but always get duplicate texts [closed]

How to scrape the full New York Times article content using Selenium and BeautifulSoup without triggering the "Please enable JavaScript" message?

How to detect and scrape a specific language version of a multilingual publication, if available?

Scrapy: "RuntimeError: Engine Not Running" when I try to run my spider after installing Scrapy-Playwright

Crawl4AI token threshold not applied to raw html in arun

How to switch to a popup cookie consent page?

Extract span values using BS4

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags